{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Agenda\n",
      "\n",
      "- Define the problem and the approach\n",
      "- Data basics: loading data, looking at your data, basic commands\n",
      "- Handling missing values\n",
      "- Intro to scikit-learn\n",
      "- Grouping and aggregating data\n",
      "- Feature selection\n",
      "- Fitting and evaluating a model\n",
      "- <p style=\"color: red\">Deploying your work</p>\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#In this workbook you will\n",
      "- Deploy your model to Yhat\n",
      "- Autogenerate a webapp with your model's data\""
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Use the same code we wrote earlier"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "import numpy as np\n",
      "import pylab as pl"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "train = pd.read_csv(\"./data/credit-data-trainingset.csv\")\n",
      "test = pd.read_csv(\"./data/credit-data-testset.csv\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.tree import DecisionTreeClassifier\n",
      "features = ['revolving_utilization_of_unsecured_lines', 'debt_ratio',\n",
      "            'monthly_income', 'age', 'number_of_times90_days_late']\n",
      "clf = DecisionTreeClassifier(min_samples_leaf=1000)\n",
      "clf.fit(train[features], train.serious_dlqin2yrs)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "DecisionTreeClassifier(compute_importances=None, criterion='gini',\n",
        "            max_depth=None, max_features=None, min_density=None,\n",
        "            min_samples_leaf=1000, min_samples_split=2, random_state=None,\n",
        "            splitter='best')"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##You can use any custom function you want"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def convert_prob_to_score(p):\n",
      "    \"\"\"\n",
      "    takes a probability and converts it to a score\n",
      "    Example:\n",
      "        convert_prob_to_score(0.1)\n",
      "        466\n",
      "    \"\"\"\n",
      "    odds = (1 - p) / p\n",
      "    scores = np.log(odds)*(40/np.log(2)) + 340\n",
      "    return scores.astype(np.int)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "##Deploying to Yhat\n",
      "from yhat import BaseModel, Yhat\n",
      "\n",
      "\n",
      "yh = Yhat(\"YOUR USERNAME\", \"YOUR APIKEY\")\n",
      "\n",
      "class LoanModel(BaseModel):\n",
      "    def transform(self, newdata):\n",
      "        df = pd.DataFrame(newdata)\n",
      "        return df\n",
      "\n",
      "    def predict(self, df):\n",
      "        data = df[self.features]\n",
      "        result = {}\n",
      "        p = self.clf.predict_proba(data)\n",
      "        p = p[::, 1]\n",
      "        score = convert_prob_to_score(p)\n",
      "        result[\"prob\"] = p\n",
      "        result[\"score\"] = score\n",
      "        return result\n",
      "\n",
      "\n",
      "loan_model = LoanModel(clf=clf, features=features,\n",
      "                 udfs=[convert_prob_to_score])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 44
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Executing the model\n",
      "####Raw Data > `transform` > `predict` > Back to Client"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "testcase = test.head(1).to_dict('list')\n",
      "transform_result = loan_model.transform(testcase)\n",
      "loan_model.predict(transform_result)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 49,
       "text": [
        "{'prob': array([ 0.13092551]), 'score': array([449])}"
       ]
      }
     ],
     "prompt_number": 49
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "yh.deploy(\"DataGothamLoanModel\", loan_model)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "uploading... "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "done!\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 46,
       "text": [
        "{u'modelname': u'DataGothamLoanModel', u'status': u'success', u'version': 4}"
       ]
      }
     ],
     "prompt_number": 46
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Create a test case with one of the holdout samples and run it through your api. See if the results match"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "testcase = test.head(100).tail(1)\n",
      "yh.predict(\"DataGothamLoanModel\", 4, testcase.to_dict('list'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 50,
       "text": [
        "{u'prob': [0.01905], u'score': [567]}"
       ]
      }
     ],
     "prompt_number": 50
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "prob = clf.predict_proba(testcase[features])\n",
      "convert_prob_to_score(prob[::,1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 51,
       "text": [
        "array([567])"
       ]
      }
     ],
     "prompt_number": 51
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Generate a single page app with your model/data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "yh.document(\"DataGothamLoanModel\", 4, train[features])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 48,
       "text": [
        "{u'docsURL': u'http://yhathq.com/models/greg/5231bb2cb517e814428ab2b0',\n",
        " u'success': True}"
       ]
      }
     ],
     "prompt_number": 48
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##We just did the following\n",
      "\n",
      "- Deployed our model to Yhat\n",
      "- Switched models on the fly\n",
      "- Generated a webapp for our model using `yhat.document`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}